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Abstract —  This  paper  investigates  learning  hierarchical 
statistical  activity  models  in  indoor  environments.  The  Ab¬ 
stract  Hidden  Markov  Model  (AHMM)  is  used  to  represent 
behaviors  in  stochastic  environments.  We  train  the  model 
using  both  labeled  and  unlabeled  data  and  estimate  the 
parameters  using  Expectation  Maximization  (EM).  Results 
are  shown  on  three  datasets:  data  collected  in  lab,  entry  way, 
and  home  environments.  The  results  show  that  hierarchical 
models  outperform  flat  models. 

I.  Introduction 

Robots  increasingly  work  and  function  within 
environments  such  as  offices,  museums,  and  hospitals  that 
are  also  populated  by  human  beings.  In  order  to  fully 
interact  in  this  dynamic  environment,  it  is  necessary  for 
them  to  understand  the  movement  of  other  occupants 
throughout  the  environment.  A  first  step  toward  this 
understanding  of  other  agents,  be  they  human  or  robotic, 
is  the  ability  to  recognize  different  activities  based  on 
movements  throughout  the  space. 

Much  previous  research  in  mobile  robotics  has  explored 
localization  and  mapping  given  a  static  environment. 
Some  of  this  work  [1],  [2]  has  studied  methods  to  detect 
and  track  people  within  an  environment.  However,  in  this 
work  the  person  is  tracked  based  on  a  motion  model  of 
typical  human  movement:  the  intentions  or  the  higher 
level  tasks  of  the  person  are  not  modeled.  Other  work  has 
been  done  that  does  model  intention  [3],  [4],  This  work 
clusters  similar  motion  trajectories  but  does  not  model 
activity  in  a  hierarchical  manner. 

Bui  [5]  introduces  the  Abstract  Hidden  Markov 
Model  (AHMM)  as  a  hierarchical  statistical  model 
for  representing  activities.  In  this  work  the  AHMM  was 
used  as  a  probabilistic  framework  for  plan  recognition, 
where  the  model  parameters  were  hand  coded  instead 
of  learned.  Murphy  [6]  investigated  learning  restricted 
1 -level  AHMM  models  where  the  value  of  the  top-level 
nodes  were  observed. 

In  this  paper  we  use  AHMMs  [5]  to  model  motion 
through  indoor  environments.  We  represent  the  model  as 
a  Dynamic  Bayesian  Network  (DBN)  and  use  Expectation 
Maximization  (EM)  to  estimate  the  parameters  of  the 
DBN  in  order  to  learn  behaviors.  This  work  extends 
previous  work  by  investigating  what  advantages  different 
levels  of  hierarchy  provide,  and  compares  the  performance 


of  1 -level  and  2-level  AHMM  models  on  both  labeled  and 
unlabeled  training  instances. 

The  rest  of  this  paper  is  organized  as  follows.  Section  II 
reviews  the  basics  of  the  AHMM.  Section  III  describes 
algorithms  learning  the  model.  Section  IV  describes  the 
datasets  and  methods.  Section  V  presents  the  results  and 
Section  VI  summarizes  the  contributions  and  describes 
future  work. 

II.  Abstract  Hidden  Markov  Model 

The  AHMM  is  a  multi-scale  statistical  model  for 
representing  behaviors  in  stochastic,  noisy  situations. 
Hierarchy  plays  an  important  role  in  activities.  Imagine  a 
person  exiting  a  building  from  a  room  on  the  second  floor. 
This  behavior  can  be  broken  into  multiple  sub-behaviors. 
A  possible  sub-behavior  of  the  general  behavior  is  to  exit 
the  second  floor.  A  sub-behavior  of  this  behavior  could  be 
to  navigate  to  the  stairway.  As  part  of  this  behavior  the 
person  must  leave  the  room.  To  do  this,  the  person  must 
take  an  action,  in  this  case  a  step  in  the  right  direction.  The 
AHMM  provides  a  way  to  model  this  type  of  stochastic 
process  and  allows  the  robot  to  infer  what  the  person  is 
doing  based  on  its  observations  at  each  timestep. 

The  AHMM  provides  a  top  down  decomposition  for 
a  fixed  behavior.  A  behavior  maps  a  state  to  an  action. 
The  AHMM  provides  a  method  of  modeling  hierarchical 
behaviors.  In  hierarchical  behaviors  a  high-level  behavior 
can  call  a  more  refined  low-level  behavior  according  to 
some  distribution.  This  low-level  behavior  will  call  a 
lower-level  behavior.  This  process  continues  until  the  most 
primitive  behavior  possible  is  performed.  In  the  domains 
with  discrete  actions  the  most  primitive  behavior  would 
be  a  single  action.  When  a  low-level  behavior  terminates 
in  some  state  then  the  parent  behavior  may  also  terminate 
with  some  probability  so  long  as  the  current  state  is  in  the 
set  of  destination  states  of  the  parent  behavior. 

A.  Representation 

An  AHMM  can  be  represented  as  a  DBN  as  in  Figure  1 . 
Edges  between  nodes  represent  dependencies.  An  AHMM 
has  five  different  types  of  nodes,  wf,  and  .  Let 

St  represent  the  state  of  the  agent  at  time  t.  Since  the  true 
state  of  the  agent  is  hidden,  observations,  Oj,  which  are  a 
stochastic  function  of  the  state,  are  required.  Observations 
are  modeled  as  a  mixture  of  Gaussians.  rnj  represents 
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the  mixture  component  of  St .  4  represents  the  abstract 
behavior  at  level  k  and  time  t.  is  the  termination  flag 
for  7r*.  e*  =  terminate  signifies  the  natural  completion  of 

4- 

B.  Transition  Model 

Each  node  represents  a  conditional  probability  distribu¬ 
tion  fCPD)  or  table  (CPT).  The  level  2  behavior  at  time  t 
depends  upon  the  level  2  behavior,  state,  and  termination 
flag  2  at  the  previous  timestep  t  —  1.  We  define  this  as 

■^\nt  l7rt-l!  Si-l5  et~l)  ~ 

a(st-i,4)  if  =  terminate 

1  if  e‘l_1  =  continue  and 

2  2 
Ttt  =  *7-1 

0  otherwise 

The  level  1  behavior  at  time  t  depends  upon  the  level  2 
behavior  at  t  and  the  level  1  behavior,  state,  and  termination 
flag  1  at  the  previous  timestep  t  —  1.  We  define  this  as 

1*7  )  *7- 1 1  st-l ;  ei- 1 )  = 

{oni(n2,st- 1)  if  e\_1  =  terminate 

1  if  e\_1  =  continue  and 

4  =  4-i 

0  otherwise 

Termination  flag  1  at  time  t  depends  upon  the  level  1 
behavior  and  state  at  time  t.  We  define  this  as 

P{e\\sti4)  =  P*.  4st) 

Termination  flag  2  at  time  t  depends  upon  the  level  1 
behavior,  termination  flag  1,  and  state  at  time  t.  We  define 
this  as 


P(et  I et  i  *7  >  st )  = 

J  j3n2(st)  if  e\  =  terminate 
|  0  otherwise 

The  state  at  time  t  depends  upon  the  level  1  behavior  taken 
at  time  t  and  the  state  at  time  t  —  1 . 


P{st  W-i,4)  =  B(st-u4ist) 


C.  Observation  Model 

The  observation  model  signifies  the  probability  of  seeing 
an  x,y  position  conditioned  on  a  discrete  hidden  state.  For 
this  application  the  observations  are  modeled  as  a  mixture 
of  Gaussians.  We  explicitly  model  the  mixture  variable,  m, 
as  can  be  seen  in  Figure  1.  The  CPDs  for  this  model  can 
be  written  as  follows. 

For  the  observation  nodes: 


P(Pt\&t  —  U  ttlf  —  Vtl)  —  N  (Of,  ^i,m) 


For  the  mixture  node: 

P{mt\st  =  i)  =  C(i,m) 
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Fig.  1 .  DBN  representation  of  a  2-level  AHMM.  The  horizontal  dashed 
lines  indicate  levels  of  hierarchy.  For  this  specifi  c  application  we  use 
a  mixture  of  Gaussians  for  the  observation  model.  Flowever  this  is  not 
required  by  the  model,  thus  we  draw  the  links  to  and  from  the  discrete 
mixture  nodes  (mt)  with  a  dashed  line. 


IIL  Inference  and  Fearning  In  AHMM  Models 

The  input  to  our  models  is  a  set  of  trajectories,  r  = 
{ri..Tjv}.  Each  trajectory  consists  of  a  sequence  of  ob¬ 
served  positions  o«  =  ( Xi,yi )  which  means  that  a  trajectory 
fi  =  {oi,  02, ot}-  ox  is  the  starting  position  and  ot  is 
the  final  destination.  Using  these  trajectories  we  are  able 
to  perform  inference  and  learning  in  these  models. 

A.  Inference 

Inference  can  be  thought  of  as  querying  the  model.  If 
X  represents  the  node(s)  of  interest  and  E  is  the  evidence, 
P(X\E)  is  the  query.  In  our  model  E  is  the  trajectory  r,;. 
X  can  be  a  node,  for  instance  7rf  or  St,  or  a  conjunction  of 
nodes,  7t|  and  Currently  we  use  the  junction  tree  (jtree) 
algorithm  [7]  for  inference  in  the  AHMM.  Jtree  is  an 
algorithm  for  exact  inference  and  calculates  the  marginals 
on  all  the  nodes  with  a  single  forward-backward  pass. 
However,  Bui  [5]  presents  a  Rao-Blackwellised  Particle 
Filter  for  approximate  inference  which  is  significantly 
more  efficient. 

There  are  different  types  of  inference  that  can  be 
performed,  depending  upon  the  type  of  questions  asked 
of  the  model.  If  we  return  to  our  example  there  are 
several  questions  we  might  ask.  As  the  person  leaves 
the  building  we  can  ask  which  sub-behavior  is  currently 
being  performed  based  upon  the  movements  up  until 
this  time.  Filtering  is  a  type  of  inference  that  answers 
this  type  of  question  by  recursively  computing  the  belief 
state  P(Xt\oi_...Ot) ■  For  example,  in  our  model  we  can 
calculate  the  belief  state  of  an  abstract  behavior  given  the 
sequence  of  x,  y  positions  observed  thus  far  by  calculating 
P(4\oi--Ot).  If  the  person  is  walking  down  the  stairs  we 
can  ask  what  behavior  was  performed  at  some  previous 
time  step  given  all  the  movement  we  have  seen  up  until 
the  current  time.  This  type  of  question  is  answered  by 
smoothing.  Smoothing  estimates  past  states  while  using 
all  the  evidence  at  our  current  time,  P(Xt_i\oi...ot). 
In  our  model  we  might  ask  for  the  abstract  behavior 
performed  at  k  timesteps  in  the  past  given  all  the  current 
information  we  can  calculate  P(7r£_;|oi...Ot).  If  the 
person  is  currently  in  the  hallway  we  can  ask  where 
they  will  be  3  timesteps  into  the  future  given  all  the 


information  up  until  this  time.  Prediction  estimates  future 
states  given  the  evidence  available  at  the  current  time 
step,  P(Xt+i\oi...Ot).  Prediction  allows  us  to  predict  what 
abstract  behavior  the  agent  will  perform  at  some  point  in 
the  future,  P(7r*+1  |oi...c>t).  Or  we  could  determine  where 
the  agent  will  be  in  future  P(s(+i|oi...Ot). 

B.  Learning 

Since  we  do  not  handcode  the  parameters  (CPTs  or 
CPDs)  of  the  models  it  is  necessary  to  perform  learning 
before  inference  can  be  done.  We  use  EM  to  learn  the 
parameters  of  our  model. 

1 )  Likelihood:  Let  9  represent  the  model  parameters. 
Given  the  data  D,  the  likelihood  P(D\9)  can  generally 
be  computed  from  the  joint  distribution  P(X,D\9)  by 
marginalizing  (summing  over)  the  hidden  variables  X  in 
the  model  whose  values  are  not  defined  by  D.  For  the 
DBN  representation  of  an  AHMM,  the  likelihood  can  be 
computed  by  unrolling  the  DBN  over  the  length  of  each 
trajectory.  However,  more  efficient  approximate  inference 
methods  are  available,  including  variable  elimination  and 
particle  filtering  [5],  Let  N  denote  the  number  of  trajec¬ 
tories  representing  the  evidence  E  and  T,  represent  the 
length  of  the  ith  trajectory.  The  joint  distribution  of  the 
overall  1 -level  AHMM  network  given  the  parameters  9  can 
be  written  as  (the  2-level  AHMM  is  specified  by  extending 
this  case  to  include  the  extra  level  of  hierarchy): 

P(7T1 ,  e1,  s,  rn,  o\9)  = 

N 

IIPKOP^rK^,!) 

i—1 

p(*,i|<i)p("*i,iki) 

P{Oi}l\mi}i,Si^) 

Ti 

1 1  P(Pi,t  1 1  ^i,t— 1 1  Si:t— 1 ) 

t  -  2 

P(ei,t \^i,t >  si,t— l) 

P(m,t\Si,t)P(°i,t Si,t ) 


last  iteration.  For  AHMMs,  exact  inference  methods  such 
as  the  junction  tree  algorithm  can  be  used  to  determine 
the  posterior  distribution  over  the  missing  high-level 
nodes,  given  the  observed  trajectory.  Alternatively,  faster 
approximate  methods  based  on  particle  filtering  can  also 
be  used  [5].  The  M-step  finds  a  new  setting  for  the 
parameters  such  that  0i+1  =  argmax^  Q(9\9^). 

3)  Training  on  Labeled  Data:  We  ran  experiments 
where  the  models  were  trained  with  labels  indicating 
which  behavior  was  being  performed.  These  labels  were 
observed  as  the  values  of  the  highest  level  behaviors 
during  training.  Figure  2  shows  both  the  1 -level  and 
2-level  AHMMs  used  during  training.  Nodes  that  are 
shaded  were  observed  during  training.  The  models’ 
performances  were  tested  by  performing  inference  to 
estimate  the  probability  of  the  highest  level  behavior 
given  test  observations  to  see  if  the  models  distinguish  the 
sequences  by  predicting  the  correct  label. 

4)  Training  on  Unlabeled  Data:  We  also  trained  the 
model  on  data  where  the  highest  level  behavior  was 
unobserved.  Figure  3  shows  the  two  types  of  models  used 
when  training  with  unlabeled  data.  When  training  with 
unlabeled  data,  we  tested  the  ability  of  the  models  to 
cluster  similar  sequences  together. 


In  order  for  the  model  to  learn  one  label  for  each 
sequence  at  the  highest  level  of  behavior  modeled  it  was 
necessary  to  fix  the  CPTs  such  that  the  highest  level 
behavior  never  terminated.  If  the  CPTs  were  not  fixed  the 
models  learned  behaviors  that  changed  frequently  over 
time.  We  fixed  the  CPTs  for  the  2-level  AHMM  as  follows 


P(< 


t  l°t ’ "t 


,st)  = 


{l 


if  ef  =  continue 
otherwise 


{1  if  ef  =  continue 
and  7tt2  =  tt?^ 

0  otherwise 


2)  Expectation  Maximization:  EM  is  a  framework  for 
maximum-likelihood  parameter  estimation  with  missing 
data  [8],  If  the  data  D  are  complete,  i.e.  if  all  the  nodes  in 
the  model  are  observed,  the  maximum  likelihood  estimate 
of  9  is  9  =  argmaxg  \ogP(D\9),  where  D  is  the  complete 
data.  However,  for  training  (1 -level)  AHMM  models,  the 
data  D  can  be  decomposed  into  an  observed  component 
5,  and  a  hidden  component  U  consisting  of  the  nodes, 
7r1,e1,s,  and  m  that  are  not  observed.  EM  finds  the 
model  parameters  that  maximize  the  expected  value  of  the 
log-likelihood,  where  the  data  for  the  missing  parameters 
are  “filled  in”  by  using  their  expected  value  given  the 
observed  data.  EM  consists  of  two  steps,  the  Expectation 
step  (E-step)  and  the  Maximization  step  (M-step).  The 
E-step  computes  the  expected  value  of  the  log-likelihood, 
Q(<9|6W)  =  Eu\s,ei  (/o<7-P(.D|0)),  where  the  expectation  is 
computed  using  9L  the  model  parameter  estimate  from  the 


We  fixed  the  CPTs  for  the  1 -level  AHMM  as  follows 

P(el  | 


if  e\  =  continue 
otherwise 


7“)/  1  I  2  1 


-i> et— i)  — 


l 


if  e\  =  continue 


and  TTj1  =  n1t_1 
0  otherwise 


IV.  Experiments 

We  performed  several  different  types  of  experiments 
to  evaluate  the  AHMM  as  a  framework  for  hierarchical 
activity  modeling.  We  limited  our  models  to  the  1 -level 
AHMM  and  the  2-level  AHMM.  We  tested  the  models  on 
three  different  datasets.  Experiments  were  run  using  the 
Bayes  Net  Tool  Box  for  Matlab  (BNT)  [9]. 


(a)  1 -level  AHMM 
with  observed  level 
1  behavior  nodes. 


(b)  2-level  AHMM 
with  observed  level 
2  behavior  nodes. 


Fig.  2.  Models  trained  with  labeled  data.  The  shaded  nodes  are  observed. 


(a)  1 -level 

AHMM  with 
unobserved  level 
1  behavior  nodes 


(b)  2-level 

AHMM  with 
unobserved  level 
2  behavior  nodes 


Fig.  3.  Models  trained  with  unlabeled  data.  The  shaded  nodes  are 
observed. 


A.  Lab  Data 

Our  first  dataset  was  collected  using  Hema,  a  B21r  robot 
equipped  with  a  laser  range  tinder.  Hema  was  positioned 
so  she  could  view  most  of  the  activity  occurring  within 
the  hallway  of  a  lab  environment.  Laser  readings  were 
collected  using  CARMEN  [10].  Figure  4  shows  the  lab 
environment:  Hema  was  positioned  at  the  end  of  the  long 
corridor.  Laser  scans  of  people  walking  along  different 
trajectories  through  the  lab  were  collected.  An  example 
trajectory  can  be  seen  in  Figure  4(b). 

The  laser  scans  were  processed  to  extract  the  (x,y) 
position  of  each  person  walking.  This  was  achieved  by 
first  computing  the  difference  between  scans  at  time  t 
and  t'  and  finding  the  degree  where  the  greatest  change 
between  the  two  scans  occurs.  The  position  was  calculated 
using  the  robot’s  localized  position  on  the  map.  Every 
third  time  slice  was  used  to  smooth  the  trajectory. 

There  were  6  different  classes  of  sequences  collected  in 
the  lab.  The  models  were  trained  using  3  instances  of  each 
sequence.  The  test  set  consisted  of  three  previously  unseen 
instances.  The  sequences  differed  in  length,  consisting  of 
anywhere  from  6  to  24  (x,y)  pairs.  Figure  4(a)  shows  the 
18  training  sequences  plotted  in  the  lab  environment. 

1)  Model  Definition:  We  found  that  setting  the  obser¬ 
vation  model  to  one  mixture  component  gave  the  best 
performance.  All  models  were  initialized  using  six  values 


(a)  The  x,y  positions  of  the  1 8  training  samples 
plotted  in  the  lab  environment. 


(b)  An  example  of  a  single  trajectory.  In  this 
example  the  person  walks  from  one  cube  near 
the  door  toward  the  robot  until  it  goes  into 
another  cube. 


Fig.  4.  Data  collection  within  the  lab  environment 


for  highest  level  behavior  multinomial  nodes  and  50  values 
for  the  discrete  multinomial  state  nodes.  Often,  far  fewer 
states  were  needed  to  represent  the  data  as  seen  in  Figure 
6  where  only  15  states  were  used.  For  the  experiments 
with  the  2-level  AHMM,  level  1  behavior  nodes  were 
initialized  to  have  15  values.  The  highest  level  behaviors 
were  initialized  to  have  a  uniform  probability.  The  means 
of  the  Gaussians  were  initialized  to  be  random  points  in 
the  data  and  the  covariance  matrices  were  initialized  to 
identity.  All  other  parameters  were  initialized  randomly. 

B.  Entryway  Data 

One  of  the  characteristics  of  the  data  in  the  lab 
environment  was  that  the  trajectories  overlapped  for  most 
of  their  duration.  Data  were  collected  in  the  entryway 
of  the  2nd  floor  of  the  computer  science  building  where 
typical  paths  of  motion  were  more  distinct.  The  data  are 
shown  in  Figure  5.  Data  collection  and  processing  was 
the  same  as  the  lab  data  in  the  previous  section. 

The  dataset  consisted  of  eight  different  types  of 
sequences.  The  training  set  consisted  of  eight  instances 
of  each  sequence;  the  test  set  contained  2  instances.  The 
sequences  differed  in  length,  consisting  of  between  15  to 
30  (x,y)  pairs. 

1)  Model  Definition:  Once  again  we  found  that  setting 
the  observation  model  to  one  mixture  component  gave 
the  best  results.  All  models  used  eight  values  for  the 
highest  level  behavior  nodes  and  60  values  for  state  nodes, 
although  approximately  30  were  typically  used.  For  the 
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Fig.  5.  Data  collected  in  the  entry  to  the  2nd  fbor  of  the  UMass  CS 
department.  The  data  consists  of  8  different  classes  of  trajectories.  The 
location  of  the  robot  is  indicated  by  the  fi  lied  circle. 


experiments  with  the  2-level  AHMM,  level  1  behaviors 
had  25  possible  values.  The  highest  level  behaviors  were 
initialized  uniformly.  As  above,  the  means  of  the  Gaussians 
were  initialized  to  be  random  points  in  the  data  and  the 
covariance  matrices  were  initialized  to  identity.  All  other 
parameters  were  initialized  randomly. 

C.  Home  Data 

We  also  ran  experiments  on  data  collected  by  Bennewitz 
et  al  [4].  The  data  consist  of  sequences  of  a  person 
moving  around  a  house.  The  data  were  collected  using 
three  Pioneer  I  robots  equipped  with  laser  range  finders. 

This  data  set  has  1 1  different  sequences  with  three 
instances  for  each  sequence  type.  Due  to  the  small 
amount  of  data,  we  used  cross-validation  to  evaluate  the 
performance  of  the  models. 

1 )  Model  Definition:  We  found  that  setting  the  obser¬ 
vation  model  to  a  single  mixture  component  gave  the 
best  results.  All  models  had  11  values  for  the  highest 
level  behavior  nodes  and  state  nodes  had  50  values  of 
which  17  were  approximately  used.  For  the  experiments 
with  2-level  AHMMs,  level  1  behaviors  could  take  on 
17  values.  The  highest  level  behaviors  were  initialized 
with  uniform  probability.  Once  again,  the  means  of  the 
Gaussians  were  initialized  to  be  random  points  in  the  data 
and  the  covariance  matrices  were  initialized  to  identity.  All 
other  parameters  were  initialized  randomly. 

V.  Results 

The  performance  of  each  model  was  determined  using 
the  percent  of  test  sequences  that  were  correctly  classified 
when  the  model  was  run  on  unseen  test  data.  These  results 
are  shown  in  Table  I.  This  table  shows  that  in  all  cases 
the  2-level  AHMM  performs  as  well,  if  not  better,  than 
the  1 -level  AHMM.  The  models  trained  with  labeled  data 
learn  to  distinguish  the  trajectories  better  than  the  models 
found  using  unlabeled  data.  The  results  for  the  unlabeled 
data  show  how  the  levels  of  hierarchy  make  a  difference. 
The  2-level  AHMM  performed  better  in  the  cases  where 
there  was  more  overlap  between  the  trajectories,  such 
as  lab  there  where  the  overlap  made  it  more  difficult  to 
distinguish  between  the  trajectories.  The  entryway  data 
has  some  overlap  but  overall  the  trajectories  differ  over 
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Fig.  6.  The  learned  observation  distributions  P(ot\st,'rrit).  Each  ellipse 
represents  the  covariance  matrix  of  the  Gaussian  for  each  given  state  st- 
The  Gaussians  are  overlaid  on  the  (x,y)  points  that  make  up  the  data. 
This  set  of  states  was  learned  using  a  2-level  AHMM. 
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Fig.  7.  The  learned  observation  distributions  P{ot\st,mt).  Each  ellipse 
represents  the  covariance  matrix  of  the  Gaussian  associated  with  a  given 
state  st .  The  Gaussians  are  overlaid  on  the  (x.y)  points  that  make  up  the 
data.  This  set  of  states  was  learned  using  a  2-level  AHMM. 


much  of  their  duration.  It  may  be  possible  to  gain  better 
prediction  results  in  the  models  trained  with  unlabeled 
data  by  biasing  the  values  of  the  parameters  and  hidden 
nodes. 

After  training  the  observation  model,  P(ot\st,mt) 
was  plotted  on  top  of  the  (x,y)  positions  of  the  data. 
The  observations  model  clustered  areas  where  motion 
took  place  within  the  environment.  Figure  6  shows  the 
observation  model  for  the  lab  data  and  Figure  7  shows  the 
same  plot  for  the  entryway  data. 

We  also  performed  inference  to  test  the  models  ability  to 
predict  higher  level  behavior.  Figure  9  shows  the  results 
of  filtering  in  a  2-level  AHMM  trained  on  labeled  data 
for  a  trajectory  in  the  entryway.  The  plot  shows  the 
probability  of  the  level  2  behavior  at  each  time  given  the 
current  sequence  of  observations,  P(7r||oi...Oj).  Figure  8 
shows  the  same  results  except  in  the  model  trained  with 
unlabeled  data.  Both  graphs  show  that  the  likelihood  of 
the  correct  class  goes  up  as  the  trajectory  progresses, 
distinguishing  itself  from  other  trajectories  that  share 
overlapping  portions. 


TABLE  I 

Percentage  of  test  sequences  correctly  classified  after 
training.  Random  gives  results  for  guessing  with  equal 
probability  for  each  behavior. 


Model 

Lab 

Entryway 

Home 

Random 

16.67% 

12.5% 

9.09% 

1 -level  AHMM  with 

unobserved  Level  1  behavior 

66.67% 

75% 

57.57% 

1 -level  AHMM  with 

observed  Level  1  behavior 

100% 

100% 

100% 

2-level  AHMM  with 

unobserved  Level  2  behavior 

83.33% 

75% 

60.60% 

2-level  AHMM  with 

observed  Level  2  behavior 

100% 

100% 

100% 

Fig.  8.  The  probability  of  the  level  2  behavior  at  each  time  given  the 
current  sequence  of  observations,  P(7r||oi...o<)  plotted  for  one  trajectory 
recorded  in  the  entryway.  The  model  was  a  2-level  AHMM  trained  with 
labeled  data. 


VI.  Conclusion  and  Future  Work 

In  this  paper  we  presented  a  hierarchical  approach 
to  modeling  motion  behavior  in  an  indoor  environment. 
We  compared  1 -level  and  2-level  AHMMs,  where  the 
parameters  of  the  models  were  learned  using  EM.  We 
show  that  hierarchical  models  outperform  flat  models  in 
cases  when  classification  is  especially  difficult.  We  also 
compared  both  supervised  and  unsupervised  learning  in 
these  model. 

There  are  several  areas  for  future  work.  Currently 
we  have  only  investigated  the  case  where  one  person 
is  moving  through  the  environment,  and  extending  it 
to  the  multi-agent  case  can  be  addressed  using  the 
multi-agent  AHMM  model  proposed  in  [11],  Other  open 
questions  involve  structure  learning  and  model  selection 
for  AHMMs.  We  assumed  that  the  number  of  abstract 
behaviors,  and  states  were  known,  as  well  as  the  number 
of  levels  in  the  hierarchy.  Efficient  approaches  to  model 
selection  for  AHMMs  remains  an  open  problem  to  be 
investigated.  Parameter  estimation  using  exact  inference 
is  expensive.  We  are  currently  exploring  approximate 
techniques.  Further  experiments  comparing  1 -level  and 


Fig.  9.  The  probability  of  the  level  2  behavior  at  each  time  given  the 
current  sequence  of  observations,  P( 7r|  |oi  ...ot)  plotted  for  one  trajectory 
recorded  in  the  entryway.  The  model  was  a  2-level  AHMM  trained  with 
unlabeled  data. 

2-level  models  should  be  performed  to  better  determine 
conditions  where  hierarchy  helps  improve  classification 
accuracy. 
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